Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore moe #26

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open

Explore moe #26

wants to merge 13 commits into from

Conversation

xzyaoi
Copy link
Collaborator

@xzyaoi xzyaoi commented Mar 15, 2024

No description provided.

@xzyaoi xzyaoi marked this pull request as draft March 15, 2024 09:49
@xzyaoi xzyaoi closed this Dec 4, 2024
@xzyaoi xzyaoi reopened this Dec 4, 2024
@xzyaoi xzyaoi changed the base branch from init to main December 4, 2024 10:30
@xzyaoi xzyaoi marked this pull request as ready for review December 4, 2024 10:30
@xzyaoi xzyaoi requested a review from Copilot December 4, 2024 15:36
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 10 out of 22 changed files in this pull request and generated 7 suggestions.

Files not reviewed (12)
  • notebooks/explore.ipynb: Evaluated as low risk
  • deltazip/modeling/_base.py: Evaluated as low risk
  • deltazip/modeling/init.py: Evaluated as low risk
  • deltazip/init.py: Evaluated as low risk
  • deltazip/modeling/auto.py: Evaluated as low risk
  • deltazip/modeling/_utils.py: Evaluated as low risk
  • deltazip/lossless/compressor.py: Evaluated as low risk
  • deltazip/modeling/_const.py: Evaluated as low risk
  • deltazip/core/sparsegpt.py: Evaluated as low risk
  • notebooks/playground.py: Evaluated as low risk
  • deltazip/modeling/gpt_neox_moe.py: Evaluated as low risk
  • deltazip/modeling/moe/base_generation_strategies.py: Evaluated as low risk
Comments skipped due to low confidence (4)

cli/chat_moe.py:27

  • The args object is not defined within the chat function. Use model_path instead.
delta_model = AutoDeltaZipModelForCausalLM.from_compressed(args.model_path, strict=True, device="cpu", unpack=True, trust_remote_code=True)

cli/chat_moe.py:49

  • The device parameter in TextGenerationPipeline should be an integer, not a torch.device object. Use device=0 instead.
delta_model.to(torch.device("cuda"))

cli/compress_moe.py:99

  • The assertion should be assert base_model is None or is_moe, "You can only compress a moe without a base representation.".
assert base_model is None and is_moe, "You can only compress a moe without a base representation."

cli/compress_moe.py:105

  • [nitpick] The repeated block of code for loading the model should be refactored to avoid duplication.
if args.target_model == "gpt_neox_moe": ... elif args.target_model == "llama_moe": ... else: ...

delta_model = None
config=None
if model_type == "gpt-neox-moe":
with open(f"{args.model_path}/base/base_model/config.json", "r") as fp:
Copy link
Preview

Copilot AI Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable 'args' is used instead of 'model_path'. It should be 'model_path' instead of 'args.model_path'.

Suggested change
with open(f"{args.model_path}/base/base_model/config.json", "r") as fp:
with open(f"{model_path}/base/base_model/config.json", "r") as fp:

Copilot is powered by AI, so mistakes are possible. Review output carefully before use.

Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
base_weights = load_file(f"{model_path}/base/base_weights.safetensors")

delta_model = AutoDeltaZipModelForCausalLM.from_compressed(
args.model_path, strict=True, device="cpu", unpack=True, trust_remote_code=True, model_config=config, custom_model = delta_model
Copy link
Preview

Copilot AI Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable 'args' is used instead of 'model_path'. It should be 'model_path' instead of 'args.model_path'.

Suggested change
args.model_path, strict=True, device="cpu", unpack=True, trust_remote_code=True, model_config=config, custom_model = delta_model
model_path, strict=True, device="cpu", unpack=True, trust_remote_code=True, model_config=config, custom_model = delta_model

Copilot is powered by AI, so mistakes are possible. Review output carefully before use.

Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
logger.info("Tokenizer loaded")

logger.info("Loading base_model")
base_model = transformers.AutoModelForCausalLM.from_pretrained(f"{model_path}/base/base_model.pt", trust_remote_code=True)
Copy link
Preview

Copilot AI Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The path should be a directory containing model configurations, not a .pt file.

Suggested change
base_model = transformers.AutoModelForCausalLM.from_pretrained(f"{model_path}/base/base_model.pt", trust_remote_code=True)
base_model = transformers.AutoModelForCausalLM.from_pretrained(f"{model_path}/base", trust_remote_code=True)

Copilot is powered by AI, so mistakes are possible. Review output carefully before use.

Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
print(f"base_weights: {base_weights.keys()}")

for expert_name, expert_weight in base_weights.items():
prefix, suffix = expert_name.split(EXPERT_ID_PLACEHOLDER)
Copy link
Preview

Copilot AI Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The split method might fail if EXPERT_ID_PLACEHOLDER is not found in expert_name. This should be handled properly.

Suggested change
prefix, suffix = expert_name.split(EXPERT_ID_PLACEHOLDER)
if EXPERT_ID_PLACEHOLDER not in expert_name: continue

Copilot is powered by AI, so mistakes are possible. Review output carefully before use.

Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
from deltazip.modeling._const import EXPERT_ID_PLACEHOLDER
from loguru import logger
from safetensors.torch import save_file
import safetensors
Copy link
Preview

Copilot AI Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The safetensors module is imported but not used directly. Use it directly or remove the import.

Copilot is powered by AI, so mistakes are possible. Review output carefully before use.

Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
# Make sure we only save the non-fc layers (i.e the layers where MoE isn't applied)
for name in to_remove:
del sd[name]
model.save_pretrained(f"{args.outdir}/base/base_model", state_dict=sd)
Copy link
Preview

Copilot AI Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The save_pretrained method does not accept a state_dict parameter. Set the state_dict directly on the model before calling save_pretrained.

Suggested change
model.save_pretrained(f"{args.outdir}/base/base_model", state_dict=sd)
model.load_state_dict(sd)

Copilot is powered by AI, so mistakes are possible. Review output carefully before use.

Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
layer_type = "LlamaDecoderLayer"
layers_block_name = "model.layers"
outside_layer_modules = ["model.embed_tokens", "model.norm"]
inside_layer_modules = [f"moe.mlp.{EXPERT_ID_PLACEHOLDER}.up_proj", f"moe.mlp.{EXPERT_ID_PLACEHOLDER}.gate_proj", f"moe.mlp.{EXPERT_ID_PLACEHOLDER}.down_proj"],
Copy link
Preview

Copilot AI Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inside_layer_modules attribute is defined as a tuple due to the trailing comma. It should be a list to avoid potential issues.

Suggested change
inside_layer_modules = [f"moe.mlp.{EXPERT_ID_PLACEHOLDER}.up_proj", f"moe.mlp.{EXPERT_ID_PLACEHOLDER}.gate_proj", f"moe.mlp.{EXPERT_ID_PLACEHOLDER}.down_proj"],
inside_layer_modules = [f"moe.mlp.{EXPERT_ID_PLACEHOLDER}.up_proj", f"moe.mlp.{EXPERT_ID_PLACEHOLDER}.gate_proj", f"moe.mlp.{EXPERT_ID_PLACEHOLDER}.down_proj"]

Copilot is powered by AI, so mistakes are possible. Review output carefully before use.

Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant